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MEDIA STREAM MIXING 



TECHNICAL FIELD OF THE INVENTION 

The present invention relates to real time mixing of at least two media streams in a portable^ 
communication device. More particularly, it relates to a method and a device for real time 
mixing of at least two media streams, for providing a real time transmitted output media 
stream. 

DESCRIPTION OF RELATED ART 

Most third generation mobile terminals will have a Video Telephony (VT) application 
implemented, which is based on the 3GPP specification 324M. This VT-application enables a 
person-to-person connection with communication including real-time voice and video 
information. These applications comprise recording and generating of one single video 
stream containing both audio and image information. 

If it were possible to combine two different media streams in a portable communication 
device, a number of attractive functions could be provided such as exchanging voice from 
one stream with voice from another, replacing mage information of one stream with image 
information of the other etc. 

One interesting way of using VT would be to use so called "show and tell". This means that 
when playing a recorded video including audio and image information, voice (audio) is 
simultaneously added to this stream. 

Furthermore, some consumers might be concerned about apparently being filmed at 
locations where it may be inappropriate or where the othfer party is not allowed to see the 
location because of security reasons. The apparent real time movie may. for instance be a 
combination of two different movies, one of which shows the location and the other shows 
the consumer. 

This invention concerns how to increase the usage of a mobile VT application and how to be 
able to handle integrity issues when usedTjri a video phone. 

There is thus a need for a method and a device that can provide an output media stream 
that is based on two separate input streams, where at least one is a real-time stream. 
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SUMMARY OF INVENTION 

This invention is thus directed towards solving the problem of providing a real time 
synchronized output media stream being transmitted from a portable communication device, 
where sard output media stream is a mixture of a first (real time) media stream and a 
second media stream. 

This is achieved by providing generating in real time a first media stream in the portable 
communication device, and combining in real time the first media stream with a second 
media stream, for forming the output media stream. 

One object of the present invention is to provide a method for forming a real time output 
media stream made up of a first real-time media stream and a second media stream. 

According to a first aspect of this invention, this object is achieved by a method for forming 
an output media stream to.be transmitted during a communication session from a portable 
communication device, wherein said media stream comprises signals of a first type, 
comprising the steps of: 

generating in real time a first media stream in the portable communication 

device, and 

combining in real time the first media stream with a second media stream, 
for forming the output media stream. 

A second aspect of the present invention is directed towards a method including the 
features of the first aspect, wherein said output media stream comprises signals of a second 
type. 

A third aspect of the present invention is directed towards a method including the features 
of the first aspect, further comprising the step of transmitting said output media stream. 

A fourth aspect of the present invention Is directed towards a method including the features 
of the first aspect, further comprising the step of establishing a connection with another 
device. 

. A fifth aspect of the present invention is directed towards a method including the features of 
the fourth aspect, wherein said connection is a circuit-switched connection. 

A sixth aspect of the present invention is directed towards a method including the features 
of the first aspect, in which at least one of the steps is dependent on input data from a user 
of said portable communication device. 

A seventh aspect of the present invention is directed towards a method including the 
features of the first aspect, wherein the step of combining comprises combining signals of a 
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first type from the first media stream with signals of a second type from the second media 
stream. 



An eighth aspect of the present invention is directed towards a method including the 
features of the first aspect, wherein the step of combining comprises combining signals of a 
first type from the first media stream with signals of the first type from the second media 
stream. 

A ninth aspect of the present invention is directed towards a method including the features 
of the e.ghth aspect, wherein the step of combining further comprises combining signals of a 
second type from the first media stream with the signals from the second media stream. 

A tenth aspect of the present invention is directed towards a method including the features 
of the eighth aspect, wherein the step of combining further comprises combining signals 
from the first media stream with signals of the second type from the second media stream. 

An eleventh aspect of the present invention is directed towards a method including the 
features of the tenth aspect, wherein the step of combining further comprises combining 
signals of the second type from the first media stream with signals from the second media 
stream. 

A twelfth aspect of the present invention is directed towards a method including the features 
of the eleventh aspect, wherein the step of combining further comprises the step of: 

delaying, prior to combining, signals of one type of the second media stream, 
in relation to the other type of signals of the same stream, for providing 
synchronized signals from the second media stream within the output media 
stream. 

A thirteenth aspect of the present invention is directed towards a method including the 
features of the tenth aspect, wherein the step of combining further comprises independently 
combining signals of the first type and signals of the second type. 

A fourteenth aspect of the present invention is directed towards a method including the 
features of the ninth aspect or the eleventh aspect, wherein the step of combining further 
comprises delaying signals of one type within the output media stream, in relation to the 
other type of signals of the same stream, for providing synchronized signals from the first 
media stream within the output media stream. 

A fifteenth aspect of the present invention is directed towards a method including the 
features of the ninth aspect, wherein the step of combining signals, where the signals of the 
first type are audio signals, further comprises the step of superposing the signals of said 
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A sixteenth aspect of the present invention is directed towards- a method including the 
features of the fifteenth aspect, wherein the step of superposing comprises weighting 
properties of the audio signals from the first media stream and the second media stream. 

A seventeenth aspect of the present invention is directed towards a method including the 
features of the ninth aspect, wherein the step of combining signals, where the signals of the 
first type are image signals, further comprises the step of blending the signals of the first 
type. 

An eighteenth aspect of the present invention is directed towards a method including the 
features of the seventeenth aspect, wherein the step of blending comprises weighting 
properties of the image signals from the first media stream and the second media stream. 

A nineteenth aspect of the present invention is directed towards a method including the 
features of the sixteenth aspect, wherein weighting properties includes varying the 
proportion of signals from the first media stream in relation to the proportion of signals from 
the second media stream. 

A twentieth aspect of the present invention is directed towards a method including the 
features of the nineteenth aspect, wherein the weighting properties is dependent on input 
data of a user of said portable communication device. 

A twenty-first aspect of the present invention is directed towards a method including the 
features of the nineteenth aspect, wherein the varying said proportions comprises varying of 
each proportion within the range between 0 and 100%. 

Another object of the present invention is to provide a portable communication device for 
forming a real time output media stream made up of a first real time media stream and a 
second media stream. 

According to a twenty-second aspect of the present invention, this object is achieved by a 
portable communication device for forming an output media stream to be transmitted during 
a communication session from said portable communication device, wherein said output 
media stream comprises signals of a first type, said portable communication device 
comprising: 

a generating unit provided for generating a first media stream in real time, 
a first combining unit, connected to said generating unit, provided for 
combining in real time the first media stream with a second media stream, 
and 

a control unit controlling the generating unit and the combining unit. 

A twenty-third aspect of the present invention is directed towards a portable communication 
device including the features of the twenty-second aspect, for forming an output media 
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stream to be transmitted during a communication session from said portable communication 
device, wherein the first combining unit is provided for combining signals of the first type of 
both the first and second media streams, wherein the output media stream comprises ' 
signals of the first type and a second type, further comprises: 

a second combining unit, 
for combining signals of the second type of the first media stream and signals of the second 
type of the second media stream.for providing the output media stream comprising signals 
of both the first and second type. 

A twenty-fourth aspect of the present invention is directed towards a portable 
communication device including the features of the twenty-second aspect, further 
comprising: 

a memory unit for providing storage for the second media stream. 

A twenty-fifth aspect of the present invention is directed towards a portable communication 
device including the features of the twenty-second aspect, further comprising: 

a user input interface for providing user input and connected to the control 
unit so that the generating unit and all combining units are controlled in 
dependence of user input. 

A twenty-sixth aspect of the present invention is directed towards a portable communication 
dev,ce including the features of the twenty-third aspect, further comprising: 

a multiplexing unit (220) for providing synchronization of signals of one type 
from the first media stream in relation to signals of the other type from same 
first media stream, within the output media stream. 

A twenty-seventh aspect of the present invention is directed towards a portable 
communication device including the features of the twenty-third aspect, further comprising 
further comprising: . 

a delaying unit for providing synchronized signals within the output media 
stream. 

A twenty-eighth aspect of the present invention is directed towards a portable 
communication device including the features of the twenty-seventh aspect, where the 
delaying unit provides synchronization of signals from the second media stream prior to 
combining with the first stream. 

A twenty-n,nth aspect of the present invention is directed towards a portable communication 
dev,ce including the features of the twenty-eighth aspect, where the delaying unit provides 
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synchronization of signals of one type in relation to signals of the other type from the same 
second media stream. 



The present invention provides an output media stream where a first real-time media 
stream is combined with a second media stream. This has the advantage that the two media 
streams can be combined in a number of ways for providing a number of different attractive 
functions for example. 

A user of a mobile device can, instead of separately sending video camera images to a 
communicating party, transmit a pre-recorded video or sound, while mixing said pre- 
recorded video with real time voice or audio information in order to provide the 
communicating party the perception that the user is located at another place than he 
actually is. Such an effect can further be enhanced by mixing moving image information, 
such as the face of the user, into said pre-recorded video. 

Upon receiving a video phone call the user can, instead of sending real time video camera 
images from his camera to the calling, decide to play a pre-recorded video answering 
message, containing moving or still pictures, stored in memory, in order to provide a mobile 
video answering machine. 

During a conversation a user of the communication device can share content information 
such as video or still images instantly by providing it in the output media stream of a VT- 
session. This can thus be used for allowing for simultaneous multimedia. 

Another example is the use of sending of a pre-recorded video file during start up of a VT 
session, where said file can contain advertisements, qualifying for reduced communication 
tariffs. 

It should be emphasized that the term "comprises/comprising" when used in this 
specification is taken to specify the presence of stated features, integers, steps or 
components, but does not preclude the presence or addition of one or more other features, 
integers, steps, components or groups thereof. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will now be described in more detail in relation to the enclosed 
drawings, in which: 

fig. 1 shows a method for providing in real time a synchronized output media stream that is 
transmitted from a portable communication device, where said output media stream is a 
mixture of a first real time media stream and a second media stream; and 
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fig _2 illustrates a portable communication device for providing the synchronized output 
med,a stream that is generated according to the method in fig, i 

DETAILED DESCRIPTION OF THE EMBODIMENTS 

^ * ' nVenti ° n re,ates t0 the formin 9 * synchronized output media stream to be 
transmuted dunng a communication session from a portable communication device. 

The features according to this preferred embodiment will be available in the 3G-324M 
a PP .cation of communication devices. In order to achieve this the flexibility of the H 245 
protocol will be utilized together with updated SW/HW architecture. 

Reference will now be given to figs. 1 and 2, illustrating a method for providing a 
synchronized output media stream in real time that is transmitted from a portable 
communication device and a portable communication device according to the present 
invention, respectively. k"«eni 

steni?, 9 ^' 5 emb ° diment Said method ^arts by establishing a connection, 

IZc? <l hTT T POrtab ' e C ° mmunication de *« 200 and another communication 
device, w,th wh,ch the user of the portable communication device would like to establish a 
VT communication session. This is done through the user selecting a ^-session via a user 

eCaT* ' T ° n h the Se,eCti ° n 3 ^ Unit 204 mak6S 3 tra «^ unfc 22 
lul h " " " ^ d6ViCe - GenerBtin9 ° f thC firSt media stre ^ having both 

audio and .rnage mformation, i.e. signals of a first and second type, step 104, is then 

performed by an audio generating unit 206 and an image generating unit 208 both 
controlled by the control unit 204. This is done through recording image via camera 
comprised in the Image generating unit 208 and audio via a microphone comprised within 
the audio generating unit 206. 

According to this embodiment the first media stream comprises both audio and image 
information. Now, providing of the second media stream, step 106, is performed by 
obtammg said media stream from a memory unit 210. Also this is done through the control 
of the control unit depending on user input. This memory unit is in this preferred 
embodiment an internal memory comprised included in the portab.e communication device 
200. as the second media stream contains multiplexed audio and image information, this ' 
stream ,s demultiplexed to separate these two types of signals. This is performed in a 
demultiplexing unit 212. The demultiplexing unit 212, performs decoding of image formats 
m order to obtain a format that is suitable for mixing according to this preferred 
embodiment A suitable format is for instance the YUV format. This demultiplexing unit 212 

th^ PCM format The TZTT*" * ' SUitab ' 6 ^ ^ ^ ^ fc * ~£ 
the PCM format The demulbp.exing unit 212 further has.a bit-rate converting capability for 

nZa'; 9 ° f 3Udi0 3nd/0r im39e inf ° rmati0n t0 faC " itate *• *5» 'f combi I g 

mformation of the same type in combining units 216 and 218.The first and second media 
streams now contain decoded separated types of signals, containing audio and image 
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information respectively. 

The above mentioned demultiplexing and decoding of the different signal types, according to 
the present invention, consumes different amounts of time. In this preferred embodiment 

5 the image processing path requires more time than the audio processing path, which is the 
general case. For obtaining a subsequent output media stream comprising synchronized 
audio and image information from the second stream, audio information from this second 
stream is subjected to delaying, step 108, by a delaying unit 214. The amount of delay used 
is further determined by the control unit, based on the difference in time of the different 

10 processing steps. However, the amount of delay used is also dependent on any time 

difference between audio and image information of the first media stream. The delaying unit 
214, effective on the second media stream, hence also has to compensate for a portion of 
subsequent synchronizing by a multiplexing unit 220, effective on the output media stream, 
which portion is due to any timing difference between audio and image information within 

15 the first media stream. 

The separated types of information, i.e: audio and image are now subjected to combining. 
Audio information from both streams are combined, step 110, by a first combining unit 216. 
A second combining unit 218 is similarly combining image information from both streams, 

20 step 112. The combining of audio information is performed by superposing audio 

information of the first stream on audio information of the second stream. This combining 
further includes weighting the properties of the audio information from the first stream and 
the audio information from the second stream. This encompasses varying the proportion of 
audio information from one stream in relation to the proportion of audio information from 

25 the other stream. 

According to this preferred embodiment of the present invention the first combining unit 
216 includes coding of the combined audio information to a suitable format such as AMR. 

30 For image information the combining unit 218 combines image information from the first 
stream with image information from the second stream by a process called a-blending, 
which is well known to a person skilled in the art and will therefore not be further discussed 
here. This combining of image information however includes weighting properties of the 
image information from the first stream and the second stream. Similar to the combing of 

35 audio information by the first combining unit 216 the weighting properties within the second 
combining unit 218 includes varying the proportions of image information from one stream 
in relation to the proportion of image information from the other stream. 

Weighting properties of audio and image information is dependent on user input data 
40 obtained from the user via the user input interface 202. 

Moreover, according to said preferred embodiment the second combining unit 218, 
comprises coding the combined image information to a suitable format, such as MPEG-4. 
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The steps of combining information of the same type, from the two different streams, is now 
followed by forming the output media stream, step 114, by the multiplexing unit 220. 

This multiplexing unit 220 further contains synchronizing capabilities in order to achieve 
internal synchronizing between the two types of information from the first media stream, 
i.e. to synchronize the audio with the image information from this stream. This 
synchronizing takes into consideration any time difference between the audio information 
and the image information within the first stream. However, it also respects any time 
difference between the time required for audio information to pass the combining unit, on 
the one hand, and the time required for image information to pass the combining unit,' on 
the other hand. These required durations will typically depend on the presence of audio 
and/or image information in the media streams being combined. 

Upon having formed the output media stream including information from the first media 
stream and synchronized information of the second stream, this combined multiplexed 
output stream is subjected to real time transmitting, step 116, by the transmitting unit 222. 

With reference to portable communication device as shown in fig. 2, it is seen that the 
control unit 204 is connected to all the other performing units, in order to control them, 
upon receiving user input data via the user input data interface 202. The step of generating 
the first media stream, step 104, providing the second media stream, step 106 and the 
steps of combining audio information, step 110, and combining image information, step 
112, require user input data. 

Furthermore, in order to delay the correct type of information, either audio or image 
information, feedback signaling is included between the second combing unit 212 and the 
control unit 204 to adjust the delay subject to the correct type of information in the delaying 
unit 214. 

The invention can be varied in many ways for instance: 

The first media stream can comprise only image information, only audio information or a 
combination of both. Also the second media stream can comprise only image information, 
only audio information or a combination of both. All these different variations of the first and 
second media streams can be combined. The memory unit can be either fixed in the device 
or be an easily replaceable unit, such as a memory stick or another memory unit that is 
connectable to or insertable in the portable communication device. 

Image information can furthermore be provided within the first or second stream, as moving 
pictures, or a combination of both still pictures and moving pictures. 

Processing of audio information from the second media stream can be more time consuming 
than processing of image information from the same stream, which means that image 
information of the second stream needs to be delayed in relation to the audio information, in 



WO 2005/004450 



10 



PCT/EP2004/006226 



order to obtain an output media stream containing synchronized information. 

The second media stream may furthermore contain audio and image information coded by 
using any of a large number of different codes. The first and second combining units may 
. 5 furthermore have coding capabilities to encode the superposed audio information and the 
blended image information in a large number of different formats. 

According to another embodiment, the first media stream is provided as a single multiplexed 
first media stream from a single media stream generating unit. In this case, an additional 
10 unit, a demultiplexing unit, is needed to demultiplex this multiplexed media stream, prior to 
the steps of combining audio information and image information separately. 

Another possible variation is to execute the steps according to the method in a different 
order. 

15 

It is furthermore possible to form an output media stream from more than two media 
streams, as well as to form an output media stream having information of more than two 
different types. It is furthermore possible form an output media stream by combining 
information from multimedia streams. 

20 

According to yet another embodiment of the present invention a first and a second 
real time media stream are combined. In this case the second media stream is provided to 
the portable communication device in real time. One example of this embodiment is 
combining one real time media stream, from a camera mounted for instance on the front of 

25 a portable device with another real time media stream from a camera mounted for instance 
on the back side of the same portable device. Holding the portable device in one's hand with 
a stretched out arm standing for instance in front of a sight-seeing spot, with the two 
cameras directed in different or opposite directions, enables one to combine the media 
stream containing audio an image information, of oneself with the second stream containing 

30 audio and image of one's current location, i.e. the sightseeing spot. It is thus easy and 

convenient to include oneself in a real time stream containing audio and image information, 
without the need of finding a second person for assistance. 

With the present invention has thus been described a method and a device for forming a 
35 real time output media stream by mixing a first media stream with a second media stream. 

The provision of mixing of media streams provides a number of attractive functions, for 
instance. 

40 A user of a mobile device can, instead of separately sending video camera images to a 
communicating party, transmit a pre-recorded video or sound while mixing said pre- 
recorded video with voice or audio information. As this mixing is performed in real time "on 
the fly" the receiving party might get the impression that the user is in another location 
than he actually is, like for instance on a luxurious vacation resort. 
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This effect can be furthermore enhanced by mixing moving image information, such as the 
face of the user, into said pre-recorded video. 

This is also applicable in other situations, when for instance the communicating party is not 
allowed to see the location for security reasons. 

Upon receiving a video phone call the user can, instead of sending real time video camera 
.mages from his camera to the calling party, decide to play a pre-recorded video answering 
message, containing moving or still pictures, stored in memory. This feature can thus used 
as a mobile video answering machine. This can be useful since the user may not want to 
turn on his live video camera, when answering a VT call but still have the possibility of 
receiving pictures from a calling party. 

During a conversation a user of the communication device can share content information 
such as video or still images instantly as a bearer for exchanging media files, allowing for 
simultaneous multimedia. 

Sending of a pre-recorded video file during start up of a VT session, where said file can 
contain advertisements, qualifying for reduced communication tariffs. 



This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects in the images include but are not limited to the items checked: 

□ BLACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 
3 FADED TEXT OR DRAWING 
[^BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 

□'LINES OR MARKS ON ORIGINAL DOCUMENT 

^FrEFERENCECS) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: 

IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



